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ABSTRACT 


The Bayesian decision method has several features which 
are desireable in the sampling inspection process for quality 
control. These features include: (1) comparison of the value 
of sample information in the decision process with the cost 
of obtaining the information; (2) basingdecisions on their 
consequences to the decision maker; and (3) allowing the use 
Mescubjective information in the decision process. In this 
paper the Bayesian decision procedure as it applies to 
variables sampling for quality control is examined. The 
basic method is developed for both simultaneous and sequential 
Sampling and the modeling of decision consequences is dis- 
cussed. Various models for the production process are 
provided and solutions for the generalized linear mode] 
obtained. Finally the incorporation of subjective information 


is discussed. 
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aN EROOUCT LON 


One of the primary concerns in the quality control of 
items produced by or received from a production line are the 
procedures by which decisions are made concerning the quality 
of the material produced. In order to develop a decision 
procedure the abstract term "quality" must be operationally 
defined. This definition usually takes the form of an equip- 
meme specification which lists the characteristics required 
Meeene Unit to perform its intended function. [In the case of 
100% inspection every unit produced or received is subjected 
to a test in which these characteristics are measured. Based 
on the test results the decision procedure is to accept those 
units which satisfy the specification and reject those which 
do not. When sampling inspection is used, a sample of the 
production output is tested. The test results of the sample 
are then used to make and accept or reject decisions con- 
cerning the population or lot from which the sample was drawn. 

The decision procedure in this case requires that a 
meci1sion function be specified which indicates, for the test 
results observed, the decision to be made (accept or reject). 
Unlike 100% inspection, in sample testing there always exists 
the possibility that the decision function may indicate an 


erroneous decision. 





The consequences of erroneous decisions represent a 
igies) to the decision maker and can range from mild to severe. 
As an example suppose a machine is judged to be out of 
calibration when in fact it is not. Then the loss to the 
dectston maker would be the cost of a needless recalibration 
which may be small. On the other hand if production quality 
were judged to be acceptable when in fact it was not, 
consumers might seek alternate sources of supply. This could 
result in the loss of entire production contracts and reputa- 
tton. Although the above examples are oversimplified the 
main idea is that erroneous decisions always represent a loss 
to the decision maker. Thus in the design of a decision 
procedure the loss due to erroneous decisions must be 
considered. 

Another consideration in the design of a decision 
procedure is the cost of testing the sample units. This 
cost includes the labor and test facilities required and may 
include the cost of the units themselves (if the tests are 
destructive) or repair costs if the units fail. These costs 
may be large if complex facilities are required or test time 
Prong. if the cost of testing is larger than the antici- 
pated consequences of a decision then the cost of information 
1s greater than its value in the decision process. Under 
these circumstances, gathering further information (testing) 
1s counterproductive. Thus a decision procedure should 
indicate the "value' of additional information to the decision 


maker. 





The most common method of specifying a decision 
procedure is based on classical hypothesis testing. In this 
approach two points on the operating characteristic (0C) 
curve for the decision procedure are specified. The OC curve 
for the procedure is the probability of acceptance versus 
equipment quality. The required sample size and reject/accept 
criteria are then developed based on the sampling distribution 
using a likelihood ratio test. Another method which provides 
greater flexibility and has features absent in the classical 
method is the Bayesian decision approach. In the Bayesian 
method the decision procedure is optimized for a loss function 
Specified by the user which reflects this particular applica- 
mon. if the loss function and the cost of testing are 
expressed in the same units the cost of information can be 
obtained. The Bayesian method also contains the classical 
procedure as a special case. Another feature of the Bayesian 
method is that specific knowledge of the behavior of the 
production process as well as subjective information can be 
incorporated thus allowing the decision procedure to adapt to 
changing requirements. 

In the following sections the basic Bayesian decision 
procedure will be outlined and the specification of loss 
functions and models for the production process discussed. 

The generalized linear model is introduced and the recursion 
equations developed to facilitate calculation of posterior 
Gieoriputions. Finally, the incorporation of subjective 


irormation 1s discussed. 





Peele baAreslLAN METHOD 


A. THE GENERALIZED BAYESIAN DECISION PROCEDURE 

In order to discuss the Bayesian method as applied to a 
meoauctiTon line, a generalized model of the production and 
Sample test process is required. Let 9 represent the charac- 
teristic of the equipments upon which decisions are to be 
made. For example 6 would be the average or mean gain of a 
production lot of amplifiers. The actual value of 6 is not 
observable, however, we can perform tests which indicate the 
gain of an individual amplifier. Let x indicate the results 
of such test. Also Jet 6 and x be related through a known 
probability density function denoted by f(x)]6). As a model 
of the generalized production process we assume a random 
meocess such that for each time t, 6 has continuous distri- 
bution. It is also assumed that there exists a time increment 


meewetor whicn 6 is constant i.e. 8 memoirs a) | tb. 


» 0, = 
This assumption implies that given a production increment of n 
items produced during At, test results for each unit are 
samples from F(x|9,). Figure 1] shows how 6 might vary for the 
generalized process. As shown in the figure, 6 for the 
increment At is a fixed but unknown quantity. It is the units 
produced during each At which are the object of the decision 


process. At each end point t t a decision must be 


jee Ce 
made to either accept or reject the units produced in the 
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preceding time increment. The decisions will be based on the 
sample data (x) which provides information concerning the 
true value of 6. At time ty let the probability density 

ayer ) represent the uncertainty as to the true value of O° 
Since the value of 9 at time t may or may not be independent 
of the previous values of 9(6, ,, 8,5...) let the uncer- 
tainty of O, given (8,7; O49 ...) be expressed by means of 
the conditional density F(e,|9,_);5 O49 ...) which is known 
or can be estimated but not necessarily the same for all ¢t. 
Since the decisions (accept or reject) are to be based on the 
characteristic 9, it is necessary to specify a loss function 
which indicates the consequences of a particular decision 
when a specific value of 9 obtains, for all possible values 
of 98 and all decisions. Let 9 represent the set of all 
possible values of 9 and let D represent the set of all 
decisions d. Then let L(e,d) be the loss function which is 
Known for all deD and 6c0. The loss for a particular decision 
deD depends on the actual value of 9, but 9 is unknown. The 
expected loss of decision d would be the product of L(9,d) 
times the probability that 6 obtains, summed over all possible 
values af 8. Let p(d) denote the expected loss or risk of 


decision d then: 





ime optima?! decision in terms of minimizing the risk jis the 
decision deD which minimizes equation (1) and is denoted by 


d* therefore: 


o(d*) = min { L(e.d) f(e)de. (2) 
deD 5 

Thus given a loss function L(e,d) and the distribution of 
a, f(e), the optimal decision is defined by equation (2). 
The optimal decision (d*) is usually referred to as the Bayes 
decision against f(@). Since the decisions (accept or reject) 
wtll be based on the sample data (x:[X]5Xo; ae xd) a 1¢ 
desirable to specify a decision function, denoted by é(x), 
which for every value of x observed, specifies the decision 
feeeei.e., the decision which minimizes the risk. Let Ss, be 
the set of all possible test results. Then the risk of the 


decision function &(x) is by equation (1) 


ae + ft Gri yaa (x | 6) aaanecdhadien: 3) 


We want to find the decision function §*(x) which minimized 
the risk as expressed in equation 3. Assuming L(6,6(x))is 


bounded, interchanging the order of integration yields: 
o(6(x f ns 5 ( sel Ghee (9) da] dx : (4) 


Equation (4) is minimized when the expression in brackets is 
minimized for each value of xeS,. Thus the optimal decision 


function 6*(x) would specify a decision d* which minimized 
the integral 


eZ 





f{ L(e,d)F(xle) F(e)de (5) 


CS 


From Bayes theorem ae = fl pelt te) (6) 


where f(x) = [ F(x] 0) F(e)de is a constant given a sample 
0 
value for x. Then minimizing the integral of (5) is equiva- 


lent to selecting a decision d* which minimizes: 


Jute) fae do = [L(0,d)F(alx)de (7) 


C 


for each value of x observed. Thus it is not necessary to 
determine in advance a decision function 6(x) which specifies 
gd* for all possible values of x. As each x is observed the 
Mesterier, mci); Pomcccliiabedemimcometie prion, f(6), by 
equation (6) and d* jis chosen to satisfy 


o(d*) = min [ bCe.d) Flelxdae (8) 
deD 


This is the same result as equation (2) except that the 
Besterior based on the test data x is used instead of the 
prior. Thus equation (2) defines the optimal decision d* 
before and after sampling as long as the appropriate value for 
fle) is used. 

The next step is to determine whether the decision should 
be made without sampling based on the prior distribution, 
f(e), or the sample tested and the decision based on the 
Dosterior distribution ACMI) In order to determine which 


actton is optimal the risk of obtaining an additional sample 


(es 





and then proceeding in an optimal fashion must be obtained. 
ieeere risk of selecting a decision immediately is greater 
than the risk of obtaining a sample result and then proceeding 
in an optimal manner, then the sample should be tested since 
this is the minimum risk action. Let »(¢,x) denote the risk 
of obtaining a sample x when the prior of 6 is ¢ and then 
proceeding in an optimal manner. AlSo let o(¢,d*) denote the 
risk of making decision d* when the prior of 6 is 6. Then 
the following decision rule will be used. 

If o(o,d*) > o(o,x) test the sample; (9) 
otherwise, make decision d*. p(¢,d*) is obtained from 


equation (2) where » = f(@) is: 


o(6,d*) = min f L(e.d)Fla)as (10) 
9 


To determine o(¢,x) two cases must be distinguished; the 
Samples are tested simutaneously or sequential sampling is 
used. 

ie Simultaneous Sampling 

It has been assumed that the cost of obtaining 
sample results is not zero. Therefore let C(x) represent the 
Gest or testing the units 1, 2, ..., nm where test results are 
represented by the vector x = (x,, X5, ..., X,). In many 
cases the cost of testing is independent of the values 
obtained, in which case the cost would be just Cy: the cost of 


testing n units. The expected loss of testing n units and 


then making the optimal decision d* plus the cost of testing 





is p(¢,x). The distribution of 9 if x is observed will be 
F(e|x) thus from equation (10): 


o(o,x) = { ie J b(e.d)#(o]x) do] ie) cm ee x] (11) 
S deD * 


Thus the decision procedure for the production lot would be 
as follows: 
i Prior to testing determine o(6,d*) from equation 
(10) based on the prior f(6). 
2. Determine the risk of testing, o(4,x), from 
equation (11). 
S Mameteerde eo (o,x) make decision d* otherwise. 
4. Test sample units to obtain data x and make 


decision d* according to equation (8). 


a. sequential Sampling 


The risk of sampling in a sequential procedure 
differs from the simultaneous case because after a sample is 
tested two actions are possible, (a) make a decision or (b) 
continue sampling. Because of this, determining the risk of 
a sequential procedure is, in general, more difficult than 
determining the risk of the simultaneous case just discussed. 
In most cases of practical] interest the sample size has a 
fixed upper bound. Let n denote the maximum number of samples 
available for testing. Then the risk of testing the first 
unit and proceeding in an optimal fashion is the risk of the 


n step sampling procedure where after each sample is tested 


iS 





mene risk of continuing is compared with the risk of choosing 
a decision and the minimum of the two risks is chosen. After 
the first sample is taken, the risk of choosing a decision 
must be compared to the risk of continuing with the n-1 step 
Sampling procedure. Thus as each sample is drawn the risk of 
continuing changes due to the change in the sample number 
remaining as well as the new prior based on the samples 
observed. The above process may be viewed as a decision tree 
Shown in figure 2 which depicts the sequential decision 
meocess for n=4. At each step (k) k = 0,1, 2, ..., n the 
risk of making an immediate decision is denoted by o($,,d*) 
where >, is the distribution of 6 based on k samples and is 
defined in equation (10). At each step this must be compared 
with the risk of continuing the sequential test process and 
the minimum risk action chosen. The risk of continuing at 
each step is denoted by 044, sXp47) where eee CXp 47 Xen 

eX 7: XJ indicating the dependence on the current prior 
and the remaining samples. 

iiemG1rfilculeyealluidedsto earlier 1s in obtaining 

walues for 0, sX,,4). The general solution procedure uses 
a backward induction starting at the last step and working 
backward to the first step to obtain the continuation risk at 
each step. For the n=4 case depicted in Figure 2 the proce- 
dure would be as follows. At step 3 after three samples had 
been observed the optimal action would be the minimum of 
o(¢2,d*) and 0($4:X4). Where o($,,d*) liso eheer mis Kno the 


optimal decision given $2 as defined in equation (10) and 
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0($35Xq) is the risk of obtaining an additional sample X 4 

and then making the optimal decision. Since a decision must 
be made after Xq4 is observed this is the same as the sampling 
risk for simultaneous sampling defined in equation (11) with 

xX equal to x,. Thus the risk at step 3 is a function of $9, 


denoted by 0($3) and would be: 
o(o.) = min Lo(¢4,d*), 0(¢3,x,)]. (elizs) 


fjemrisk at step 2 will be the minimum of the decision risk 
o(¢5,d*) and the continuation risk 0(¢,,x3). The continuation 
risk p(¢5,x3) is the expected value of the risk at step 3 


based on the sample, X 3. TAS 5 


9($5.X3) iz ELo ($4(x3))] x ls) 


| minfo(¢3(x.hd*), 0(o4(x4),xglf(x3)dx, + C3 


Se. 


where: ¢3(x3) 1s ?5 given x4 1.8. Fy (8/x,) 
F(xg) = f f(x3le)fp(8)de 
C) 


C, = expected cost of obtaining value X 4 


Thus at step 2, the optimal action again being that with 


mmimal risk, the risk is 


o(¢5) = minlo(o5,d*), 0 (o5.x3)]. (14) 
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iis approach is then repeated for step 1 which yields 


0($) 5X5) = ELo(o5(x5))] a 


(IP) 
= f min [o(oo(xy),d*), ldo (xp)]F(xp)dx, + Cy 
e2 
and 
o(4,) = minlo(o,,d*), 0(¢,.X5)] . (16) 


iis at step 0, the beginning of the procedure, the risk of 


the entire sequential test procedure would be 


o(o9) ee minlo(¢),d*), p(X) J Le) 


where 0($95X)) = ELo($,(x,))] is the risk of the entire 
sequential sampling plan. 

As seen from the above discussion determining the 
risk of a sequential procedure is a non-trivial exercise. 
The degree of difficulty depending on the sample size n, the 
loss function L(6,d) and the sampling and parameter distribu- 
mons , mx o ) and f(9). Examples of the above procedure for 
sequential testing may be found in the open literature 
mie 2 and 3]. 

oF Comparison of Sequential Versus Simultaneous Sampling 

In the design of a quality control procedure the 
method of sampling must be specified. In order to determine 
which of the two methods is preferred in a given situation 
the risks of the two procedures should be compared. In lieu 


of mitigating circumstances such as ease of implementation, 


|e, 





imereased complexity, etc., the decision as to which sampling 
scheme, sequential or simultaneous, is optimal should be 
based on the risks of the two procedures. That is if o(s*) 


is the risk of the optimal sampling scheme s* then 
ese) a minLe (Sq); e(s.)] . els) 


where SQ = sequential sampling 


Simultaneous sampling 


—" 
u 


In many cases the decision as to which sampling 
Scheme is superior is obvious due to the nature of the test. 
For example, if the cost of testing is constant regardless of 
the number of units tested then simultaneous testing would 
provide minimum risk. If however the cost of testing were 
only a function of the number of units tested then sequential 
testing would be superior. It is when the cost of testing 
assumes some combination of the two extremes that the optimal 
choice becomes unclear, in which case the risks of each 


procedure must be compared to determine the optimal approach. 


She NamenowmlOlemO THe GENERALIZED PRODUCTION PROCESS 

In order to gain insight to the use and requirements of 
the Bayesian decision method, the implementation of the 
procedure on the generalized process of Figure |] will now be 
discussed for the simultaneous sampling case. 

At time tg» ty» tos the following assumptions are 


made. 


20 





ls A production sample of size n, from production 
Ione Q. is available for testing. 


Ze Samples are independent given 6, and the sampling 


c 
density f(x|6,) is known. 


5. Iiemcos ts Ohecesting the n units, C is known. 


‘a Tes 
4, A loss function L(6,d) is specified for al] deD 


and 6c0. 


D5 At time t prior to sampling, the distribution 


0? 
of 89 is known and denoted by f(@,). 


ae Wiemeconale1omal distribution of ¢ given 


Pad 


) 1s known and denoted by f(¢ 


ote: Gs lie wore 
where a markov dependence is assumed for 
iol UShE ide On) 


Meetime t prior to sampling, two actions are feasible. 


0? 
Either make a decision (accept or reject) or test the sample 
to gain information. If a decision is made without sampling 
the risk will be o(d*) as defined by equation (8). The risk 
of testing the sample x = (X15Xo; -.+s3 X,) and making the 
optimal decision, o(¢,x) is given by equation (11). Assume 
that sampling represents the minimal risk action. After the 
Sample result is obtained, the prior (8) must be revised 
and the optimal decision d* chosen. Denote the posterior or 


new prior based on the data sample by f(8,|x), then by Bayes 


theorem 


2] 





The posterior f(@,|x) is now used to determine the optimal 
deciston d*, by equation (10) 

o(d*) = min f b(asd) F(@_ |x) d8q. 

deD 4 

After the decision is made on lot Qo at ty the procedure steps 
com ot Qy at ty. In order to determine the appropriate 
actions concerning this lot the distribution f(e,) must be 
obtained. It is at this point where the model of the produc- 
tion process is used. The relationship between 99 and 8 
must be known in order to determine the density f(9,) based 
on the posterier f(e,{x). The relationship between 9, and a, 
is specified by the conditional density f(6,| 8) which is 
obtained from the model of the production process. Methods 
by which this density may be obtained from the production 
model are discussed in a later section. Given that f(8,/ 8) 
is known then f(e,) prior to sampling from lot Q5 1s obtained 


gee to!llows: 


f(e,) = J (2118) F(e,|x)de . (19) 
Q 


isang ti1s Valiuie as the prior for 943 o(d*) and me) 
are obtained using equations (10) and (11) as before and the 
decision rule (9) is applied. If the decision rule indicates 
that the risk can be lowered by sampling, the procedure as 
outlined for 8g is followed. If however, no sampling is 


required then decision d* is made and the procedure advances 


22 





to t,. At ty, (85) must be obtained based on F(8,[x) since 
no samples were observed at time t,. The density f(0,) would 


be determined as follows: 


| #(e5 185) (6) [x) dy (20) 
Q 


f(65) 


where f(65/69) f fle.ley) f(6,|8) dé, 
0 
After f(a,) is determined p(d*) and p(¢,x) are obtained as 
before and the decision rule applied. The entire procedure 
1s then repeated to determine if samples should be tested at 
t.; Cas os. OUC. 

Under the decision process described it may be possible 
that no sampling would be required for several production 
lots. At first thought this may seem contrary to the objective 
of minimizing the decision risk. If the production sequence 
8g» 815 G8 -.- is highly correlated then knowledge of one 
value of 6 implies considerable knowledge of succeeding (and 
preceding) values. The correlation is expressed by the density 
f(e,[e,_,) which is derived from the model of the production 
process. The decision process thus quantifies the feeling 
"When one lot is good the next one usually is good also." 

In order to apply the Bayesian method in loss function, 

L(e,d), and the sampling and process densities, f(x|{e) and 
F(6,/6,_,) must be specified. These are the subject of the 


following sections. 


a3 





Pat ea weWoomrUNCHIONS 


The purpose of this section is to examine various ways 
in which the consequences of decisions can be related to the 
true value of quality. In the preceding section this rela- 
tionship was generally referred to as a loss function, L(9,d). 
As mentioned previously the loss when the best decision is 
femestOr @ given value of 6 is equal to zero. The loss of a 
particular decision d when 9 = 6 is the difference between 
the consequences if dis chosen and the consequences if the 
best decision were chosen. The loss then essentially repre- 
sents a regret or opportunity cost. From the above definition 
iis seen that one characteristic of loss functions is that 
they are non-negative. ence th the decision process the 
risk of sampling is added to the cost of testing the loss 
Function and testing cost must be expressed in similar units 
(e.g., dollars). In the following examples it is assumed that 
the utility of money is linear over the range of interest. 
This assumption alleviates the otherwise necessary transforma- 
mnommor the loss in dollars to utility. If the utility of 
money is continuous then at least to a first order approxima- 
tion the linear assumption is valid. In the following 
paragraphs several examples of loss functions are discussed. 
Their presences is not meant to imply that they are in any 
way the best or most useful loss functions. The loss function 
used for a particular process depends entirely on the 


Sicuiation at hand. 
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Ae Sc MMeETREC LINEAR LOSS FUNCTION 

The following example demonstrates that if the decision 
consequences are a linear function of 9 then the loss function 
will be linear and symmetric about their intersection. Let 
R(e,d.) represent the consequences of decision d. when 6 


obtains. For the linear case 


R(6,d,) Sean a b G2) 
R(8,d,) ud Ort ED to, dy > a] - (22) 


Equations (21) and (22) are plotted in figure 3. 
If the consequences are viewed as a cost then the best 
decision is that which minimizes Rfe,d) for all values of @ 


Thus for 6 < 8, decision d, is best and d, is best for 


@ > 8. iemlloms L(e,d,) 1S 
a,6 + b, - (ane + b,) 8 < 8% 
L(8,d,) si 
0 9 > 99 
or 
(ore Dew) 
Cap-ay) (a,-a,) = 8(a,-a,) Om 99 
L(@,d,) = 
0 @< 86 
F (i) and (22) i?! 
rom an 6, = 
0 (a,-a,) 
Thus 
(a5-a,) (85-8) @ < 85 
L(e,d,) 3 ) (23) 
0 3) 2 89 


Zo 








9 s 


Figure 3. Decision Consequences. 
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Eeetne Similar calculations: 


L(a,d,) = (24) 
(a,-a)) (8-8) 6 > 8) 
Equations (23) and (24) are shown in figure 4 
As evidenced by equations (23) and (24) and depicted in 
figure 4 the loss functions of linear consequences are linear 


and symmetric about the intersection. 


Bs QUADRATIC LOSS FUNCTIONS 


The quadratic loss function is defined as: 


( C,(0-85)° a < 8@ 


0 

L(e,d,) = (25) 
0 ceed 86 
0 @< 99 

L(e,d,) = > (26) 
C5(8,-9) @ > 8 


Peneuimistic JUStification for this genera! form for a 
loss function can be made as follows. Assume for a particular 
problem that the loss function L(e,d) and all its derivitives 
exist. Let 84 represent the dividing line between acceptable 
and unacceptable quality. If 6 > 96 let d, be the proper 
@ecision and if 6 < 8 6 let d, be the proper decision. Thus 
L(e,d,) Secor son o> 8, and L(e,d,) = 1 POPs re Define 
meg) = L(e,d) then L(6) = L(6,d,) for 8 < 8) and L(e) = 

e 


L(8,d,) for © > 8. L(9) can be expressed as a Taylor series 


expenston about 6, as follows: 
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migu@gems ew ninéar Fosse Funce&ions . 
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L"(89) | 9 
L(e) = L(8,) + L'(6,) (8-6) 7 (8-6,) faa C27) 


If @ = 0, then either decision d, or d, is optimal and the 
loss is zero. This implies that L(89) =O. [ft L(89) is 
zero then @, is a minimum for L(@) which implies that 
8, ) = 0. Further if L(8,) is a minimum then L" (eq) must 
be non-negative. 

Applying these results to the Taylor series expension 
(27) yields: 


L"(8q) )2 L"( 84) )3 . 


L(e) 2 (8-8) — (8-8, 


igus tne loss function for @ close to 99 can be approximated 


ny: 
2 
C,(8-8,) a < 8 
L(e,d,) - (28) 
0 @ > 99 
0 oa 9 
L(e,d,) = (29) 
? \ 
C,(8-8) ee 89 


which is the quadratic loss function originally defined. 
Instead of specifying the indifference value 8 two values 
91295 could be specified. Where 8 would represent minimum 
acceptable quality and 85 would represent the maximum 
rejectable quality. Then for d, = accept and d., = reject the 


loss function could be expressed as follows. 


ag 





C, (8-8 a < 8, 
L(e,d,) = 
0 a> 8) 
0 @< 3a. 
L(8,d,) = 
2 
C,(8-8) 8 > 8 


Ca CONSTANT LOSS FUNCTION 
The constant loss function represents the case where 
the loss is independent of the value of 9 over a specified 


range. The constant loss function could be represented as 


mol lOWS : 
] — 96 
L(8,d)) = (30) 
0 @ > 96 
i 0 8 < 8% 
L(9,d,) = (31) 
] 8 > 8% 


itiesmexpected 10Ss Or risk, o(d), for decisions d, and 


d, would be: 
°Q 
o(d,) = i] ] ee-fle)de = Pie < 89) = 8 
o(do) = [ 1. fla)de = Pla > 8) = a 
0 
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ine wrisk for decision d, iSeries roodba lity that d., 
ts the correct dectston and the risk for d, is the probabil- 
ity that dy is the correct decision. o(d,) and o(d,) are 
usually referred to as the probability of type II and type I 
errors respectively and denoted by 8 and a. It can be 
shown [1] that the decision function &(x) which minimize the 


“4X 
Ike hes the form. If Ftet < € then reject. 
7) eee 


where 


and @ is the maximum likelihood estimate for 9 based on he 

This decision function is the generalized likelihood 
ratto criteria upon which sjagsieal hypothesis testing is 
based. Thus quality control procedures based on classical 
hypothesis testing imply that the loss functions of (30) 
and (31) are operative. 

This concludes the discussion of the most common types 
of loss functions. The next section considers the problem 


Smerormulating a statistical model of the production process. 
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neem OUT LON MODEL 


As mentioned previously in order to apply the Bayesian 
decision method the conditional density ACPO, must be 
known or estimated. In order to specify this density the 
quality control specialist must specify the underlying 
mechanisms which determine how the production process is 
evolving over time. In the following models two assumptions 
will be made: (a) that the process is determined by a 
Specific underlying relationship plus random disturbances 
and (b) that by the central limit theorem, the ghost of 
LaPlace or some other incantation, the disturbances are 
assumed to be normally distributed with known mean and 
variance. In the following paragraphs three models which 
might be used to characterize a production process are 
described. They are the linear trend model, the autoregres- 
Sive model and the periodic model. The generalized linear 
model is also presented. In addition to the process models 
a typical observation model will be included for completeness. 
For convenience and to aid in the later development of the 
solutions for the generalized linear model it is also assumed 


that the observation errors are normally distributed. 


32 





A. LINEAR TREND MODEL 

The linear trend model represents a production process 
where the underlying trend is a linear increase or decrease 
feene Characteristic 6. Let 6_ represent the change of 6 


from t-1l to t and x, represent the observations at time t. 


is 
Then the observation and process models could be described 


as: 


~N(0,0,°) (32 


OBSERVATION: x, = 6, + e 2 


iE 


BROCESS: 6, = 6 EeCeey Ee AW 


oop 8) We~N(0,0°) (33) 


B 


In the above model e, represents observation noise and 


Wi represents the process noise causing deviations. from the 

linear relationship. From the model, the conditional density 
+) = 6,_,+ 6, and 
variance equal to me In order to apply the Bayesian proce- 


Pe, |e,_,) would be normal with mean = E(@ 


dure the increment 6, and the variance o must be known or 


iE 


estimated from the process. If the uncertainty in 55 its 


m@corporated in the mode] the result is 
: 2 
6. = 8 + § + Ww Ww, ~N(0,¢ ) 
2 
then f(e,]|¢ eee ee + re) cy as assuming 6, and w 
fe te | pe RP SR t t 


are uncorrelated. 
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Be AUTOREGRESSIVE MODEL 
The basic autoregressive results from the following 


assumptions about the production process. 
ie t(e.) - N(O,c°) 
t Vt 


Mec verses) ~ N(OWC) Vt, Cc = ( 2 2) 


(eRe) lo} 


Assumption ] indicates that at any time t the uncertainty 
in the location of 6, can be expressed as a normal probability 
density about the overall process mean (assumed to be zero 
in this case) with process variance a GOmmMommpotr dill tc. 
Assumption Dmaiica tes that the values of 6 at successive 
ttmes are not independent and their joint density is 
bivartate normal with correlation coefficient o. The observa- 


tton and process model for the autoregressive process can 


be expressed as: 


~ N(O,0.°) (34) 


Meee RVATION: x, = 6, + e . 


t oy 


PROCESS: 8, = 96, , + w, wy ~ N(0,0°(1-0°)) (35) 


t 


This model is useful when there appears to be no under- 
lying trend either up or down in the process and the quality 
of successive lots appears to be highly correlated. From 


the above model 


f Z 2 
F(e,|e,_4) N(00, 459 (l-p i.) 
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es PeRDODIC MODEL 

If the productton process behaves in periodic fashion 
over some internal T then a periodic model is appropriate. 
Let e(t) represent the periodic function which describes the 
production process and Jet the average value of the periodic 


i | 
function be zero (i.e., om | e(t)dt = 0). Then the periodic 
0 


function e(t) can be approximated by the Fourier series as 


SINKw t) ,w. = & 


n 
et) = ce (a, COSkw. t + b , 


k 


where the coefficients ay and Dy are unknown and subject to 
disturbances. The value of n being large enough to make the 


approximation valid. 
let 0! = (a.a ae b_) 
ase 2, i 2 
and pl = (cosw ot coséu t pneai cosnu t sinw,t Bree sinnw t) 


Then the observation and process models are: 


=eieel . 2 
OBSERVATION: X = Bi 6, + e, e, N(O,0, ) (36) 
PROCESS: cee a Wi. Ve N(0,2) (37) 


where 0 is a 2nxl vector of zeros and ¢& = ECW wid. 


Diy GENERAL LINEAR MODEL 

The above models represent special cases of the general- 
ized linear model incorporating special features to reflect 
particular characteristics of the production process. The 


generalized linear model is defined as follows: 
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i 


Fe taaex ies vectOr Of Observations at time t 
2 ie pxl vector of process parameters at time t 


A, = nXp matrix characterizing the observation 


A, = pXp matrix characterizing the process 
ee nx] vector of observation noise at time t 
a= pxIl vector of process noise at time t 


Seed = EC w, J = 0 


VAR (e,) = Ele,e,] = C, 

VAR (w,) = ELw,w] = Co 
Then OBSERVATION x, = A,6, +e,  e, ~ N(O,C,) (38) 
PROCESS Jes A5®s_] + Ww, Go N(O,C,) (39) 


is the generalized linear model. 

In order to implement the Bayesian decision procedure 
various probability densities are required to determine the 
risks of alternative actions. From section II it can be seen 
that three densities must be obtained in the course of the 
procedure. In order to obtain the risk of immediate decision 
without sampling the prior distribution of 6,f(6) must be 
obtained. To obtain the risk of sampling the prior distribu- 
tion of x, f(x) and the conditional distribution of 6 given a 
sample x, f(9|x) must be obtained. In order to determine the 
optimal decision after sampling f(e|x) is required. Thus to 
implement the decision procedure three densities must be 


obtained at each step in the process. If the observation and 
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productton process can be modeled by the generalized linear 


model of (38) and (39) the required densities can be obtained 


as follows [4]. 


Let: f(8 be the density of 9 at time t prior to 


+) 
sampling 
f(e,|x) be the posterior of 6 at time t based 


on the sample x 


f(x,) be the sample distribution prior to sampling 


Pao let the distribution of 6 at t-1 be N(u,2) 


Then from (38) and (39) 


E T 
f(8,) N(AjusAntA, + C) 


r 


ia T 
FES N(A,Anu, A,(AntAn + Co)A, + Cy) 


) 


(40) 


(41) 


p(o,|x,) is obtained as follows (where the matrices for which 


inverses are needed are assumed non-singular). From Bayes 


theorem f(e,|/x,) a f(x]e,) f(8,) 


From the model f(x,|e,) ~ N(A,8,. C,) 


-k 
Thus f(6,/x,) a e 30 


perme Q = (x - A,6 yl 


Ton sat 


(@-Apu)(A,zA,! + Cy)” !(o-Agu) 


the t subscripts being deleted for convenience, collecting 


the 6 terms yields: 
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te el T 


= T -] 
where A = (A5ZA, + C4) 


completing the square results in: 
Q = (e,-pd)' D7'(e,-pd) + Cx'c,~'x + ula, Tan - alata] 
Thus F(e,|x,) ~ N(Dd,D) (42) 


= T =| : 


d= A,'c,"'x + (AgzA,! + ¢,)7'A 
Thus with the aid of the generalized linear model the required 
densities for complex multidimensional production processes 
can be evaluated in a straightforward manner using (40), (41) 
and (42). 

Equations (40)and (41) represent the one step ahead 
predictive distributions of 6 and x based on the prior at t-l. 
As discussed in section II, if no sampling is performed at 
some t then the predictive distribution for 96 at t+l based on 
the prior at t-]l is required. In general, a method is needed 


TH 


to obtain the k step ahead predictive distributions of 6 


and xX. 


TH 


Let F(8,) : N(u,5Z,) then the k step ahead distribution 


OFA ; ees Baal? can be obtained recursively from the 


linear model as follows: 
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eee ote eek = 91,2, ... (43) 


- T 2 
Beek — Aol tap yo (0) rae eee (44) 


yh step ahead predictive 


Let F(X+ 44) i N(mo ap oCpap) be the k 
distribution for the sample eal then from (43) and (44) and 


the linear model, Me ak and Cra can be determined recursively 


by: 


pe =A AG + 


1*ttk 1, 


For an example of the use of the linear model and the 
risk calculations the reader is referred to Appendix A. As 
an interesting aside, the recursion relationships developed in 
(42) thru (46) are identical to the results obtained using 


Kalman filtering. 


Oe, 





V. INCORPORATING SUBJECTIVE INFORMATION 


One of the primary advantages of the Bayesian decision 
procedure is its capacity to incorporate subjective informa- 
tion into the decision process. Subjective information can 
be incorporated into the decision process by either of two 
routes, either by revisions to the process model or by 
altering the prior distribution of 6. The method chosen 
depending on which more accurately reflects the subjective 
information. Examples of how subjective information may be 
used will be discussed with respect to the generalized linear 


model which is repeated here for reference 


OBSERVATION: x, = A,64 + e 


t t> “t 


PROCESS: ene, ee AKO, 


ie iG 
As an example of the use of subjective information, suppose 
that the autoregressive model of section IV is being used to 
model the production process and production appears to be 

mor ly stable (i.e., no trends). You are informed that 
eating with the next production lot three engineering 
emangdes will be incorporated into the units. It has been your 
experience that whenever more than one engineering change is 
incorporated that the production quality is momentarily 
reduced and then increases with successive lots as the new 


procedures are learned and the inspectors gain experience. 
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How might this Subjective information be incorporated into 
the dectston process? One approach would be to adjust the 
Orror for the first change lot by lowering the mean to 
reflect the anticipated decrease in quality and increasing 
the variance to reflect the associated uncertainty as to the 
actual process value. Increasing the variance will have the 
effect of weighting the new sample data more heavily in 
determining the posterior. After adjusting the prior to 
reflect the anticipated decrease in quality the autoregres- 
Sive model might be replaced by the linear trend model to 
reflect the anticipated increase of quality with successive 
lots as a result of the learning process. The rate of 
increase Sy in the linear model could be changed for each lot 
to provide a linear approximation to the anticipated learning 
G@iirve. As the process quality returned to its original 

level the autoregressive model would again be used. 

This example illustrates two important features of the 
Bayesian decision procedure and the use of the linear model. 
First, when using a linear model it is not necessary that 
1» , and C, 
known at time t and thus the parameters of the linear model 


A C be constant for all t only that they be 

are free to change as required by the process being modeled. 
The second feature of the Bayesian procedure illustrated in 
the example is adaptability. By changing the model structure 


Or the prior to reflect uncertainty in the process, the 


information requirements (sample data) of the procedure 


4] 





adapt to reflect these changes. In order to maintain the 
same risk more samples will be required if the variance of 
the prior is increased to reflect uncertainty. Thus unlike 
traditional quality control procedures where the same sampling 
and decision procedure is used the Bayesian method can adapt 
to reflect the changing requirements of the production 
process. As another example of adaptability consider the 
autoregressive model. Because of the high correlation from 
one lot to the next the method reduces the sampling required 
when quality is either very good or very poor, thus taking 
advantage of the natural excursions of the output quality. 
The adaptability feature of the Bayesian method also 
has another interesting property. It indicates where, when 
and the quantity of quality control resources to be used. 
This is especially important when trying to maximize the 
effectiveness of the quality control function on fixed or 


imemyced resources (i.e., labor, test facilities, etc.). 
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APPENDIX A 
A BAYESIAN DECISION EXAMPLE 


The following example is provided to illustrate how the 
Bayesian method is applied and the required risks are 
obtained. It is assumed that an autoregressive model is used 
to represent the production process and that simultaneous 


Sampling is used. 


Let: OBSERVATION MODEL: x, = 0, + e,, e, ~ N(e,o°) 
PROCESS MODEL: 0, = 08, 7 + Hy 
We ~N(0.0¢(1-0°)) 
mmeecne loss function is: 
L(e,d,) zs exp[-(8-6,) J » 7 2 < 8 <@ 
L(e,d,) = exp[-(6,)-0)], - 2 < @ < @ 


where : 86 is a Known constant. 


e 2 : , 
Also let f(e,_,) ~ N(u,_,,9,_,) be the density of @ at time 
coe 
From section IV the correspondence between the auto- 
regressive model and the generalized linear mode] is: 
> 


2 Z 
A, = ], A. = 9, C, BO. and C. = g (l-p Thus by 


equation (40) and (41) making the above substitutions: 
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2 2 
ere ons oo, ) t Usa eo) 


S 2 
eS Nees ep i” © (laa ee o 8) 
2 2 2 
let Mp, F OWE] and Sn SPT ae + (1-0")\e° 
then f(e,) ~ N(u,>.0,°) (1) 
: t meet | 
: 2 2 
f(x) ~ N(u,.o, + co, ) (a) 
To determine the risk of making an immediate decision from 
equation (10) 
2 b:3} 


o(d,) = f[ L(0,d,) #(e,)do, = explt - (u, - 99)] 


and e 9 


o(do) = f Llesdy) Flo,)do, = exel— + (u, -ag)] (4) 





Equations (3) and (4) are plotted in figure A-1 as a function 
of the mean Hye 
Since p(d*), the risk of the optimal decision, is equal to 


the minimum risk from figure A-] it is observed that: 


d < 
5 (d*) -! (5) 
e) 


which implies the following decision rule: 


ron (do at W, < 8% 
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"9 
Figure A-1t. Decision Risk. 
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By symmetry the rtsk of decision d* is 
2 


at 
iC meses - | t-8, |] 


In order. to determine the risk of sampling and then making 
a decision d* the posterior f(6,/x) must be obtained. The 
process of testing a sample of size n parameterized by a 


fixed but unknown value 8, Can be viewed as: 


where the samples are assumed independent. Applying the 
; 2. = : 2 = 
linear model: A, = aly Ay = |, Cy Sis and C, = 0. 


By repeated application of equation (40)one obtains: 


2 eee 








- e or 2° 
> NO] 7 Oo 
2 fy 2 
e n 2 r T5 
ad 
E 7 
where: xX =a y Xs 
1=] 


From section II the risk of obtaining an additional sample 


gaa then choosing d* is: 


o(x) = j cat J o(e.d)F(elx)de. PG) OG (6 ) 
ele: 8 
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the expression in brackets is p(d*|x) that is, the risk of 
Gemayven a sample result x. o(d, |x) and o(d.|x) may be 
obtained by substttuting f(e,|x) for f(6,) in equations (3) 
and (4). This substitutton yields: 


o(d,1x) = explya* - (8-05) ] 


o(d,|x) = exp[ ya + (8-60) ] 

















where 
0} : - 2 oy 
a, - Ay ie i teen 
a = 7 g = @ 
oO 
0} Z X 
- 2 x oe aes 
Ct n 
From equation (5) 
mdse), 6 <6 
o(d*|x) “| 2 0 
Equation 6 becomes: 
X eo 
a(x) =f  o(dyl Ry) F(R) dk, + J o(4,/X,) FRIAR, (7) 
= CO Xe 
2 2 
oP moe us) fe) 
ee Ket = " a a 
Cee X= ———>—_ + 9, and F(x, ) N(u, alae ) 
no 
Ic 
Solving (7) yields 
: 2 : 2 
if Se ee , Kae = u 
o(x) = o(d,) $f SE | + o(a,) of 2 
ZX Pear aX 
ae kn yo. a 
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where: o(d,) and o(d,) are as defined in (3) and (4) 


Pi yeemip(z<-) , 2° -N(0,1) 


o(-) = 1-0(-) 


Based on equations (5) and (8) the usual decision rule is 
applted. If p(d*) > p(x) + E[C(x)] take another sample 


otherwise make decision d*. 


TH 


To determine the mean and variance of the k step ahead 


predictive distribution u, , and a equations (43) and (44) 


t+k 
are used recursively to obtain the following results. 


Wah = 0K, (9) 
2 2 2 2 
Tek = p “ee + Heer <) (10) 
TH 


The predictive distribution for k step ahead sample 


parameterized by Mey and Cea Be 4 oF) eeciniG) mn 461) eecdire : 


en 


- ee 2 
Poeun e «oy 


In order to examine the behavior of the posterior of 68, 


f(a|x) as the sample size is increased observe that 











| ca. my ; oy. 
TAO sg CPO 9) ec ol Nes 
SE ers i 2 

i 


48 





Thus lim f(e[x) + N(e,,0) 


n-+o 
Since Re is a consistent estimator of 9,. This implies that 
as n increases the knowledge of 6, as represented by f(e, |x) 
becomes ‘berfect" in the sense that the variance approaches 


Boremana 6, 1S then known. The variance decreases approxi- 


t 
mately as ul Tove ie >> “x? 
n 2 
Oo 
iE 
Th. a tas ; ; 
The k step ahead prediction distribution, ME Arie) of 
(9) and (10) represents the uncertainty in 0,4, based on 
immarmation up to and including time t. The lim CUO ware) ~ 
k+e 


mo, 0°) which is the distribution of the process before any 
information is obtained. Thus as k increases the information 
obtained at time t loses its "value" in predicting the 
Moeatcion Of the process at t+k. fhe rate at which previous 
knowledge is discounted is a function of o the correlation 
and 9, which in this example was assumed 


ttl i 
Gonstant for all t. 


between 9 
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